BiRe-ID: Binary Neural Network for Efficient Person Re-ID
155
∂L F
MSE
∂wi
= μ(ai −a∗
H) ∂ai
∂wi
I(i ∈L),
(6.23)
where I is an indicator function defined as
I(i ∈L) =
1,
i −th layer is supervised with FR −GAL
0,
else
.
(6.24)
As mentioned above, we employ several FR-GALs in the training process. Therefore, I(i ∈L)
denotes whether i-th layer is supervised with FR-GAL. Note that FR-GAL is only used to
supervise the low-level feature. Thus, no gradient is aroused to the high-level feature.
In this way, we calculate every specific gradient of wi as
wi ←wi −η1δwi,
(6.25)
where η1 is a learning rate.
Update αi: We further update the learnable matrix αi with wi fixed. Let δαi be the
gradient of αi, we then have
δαi = ∂L
∂αi
= ∂LS
∂αi
+ ∂L K
Adv
∂αi
+ ∂L K
MSE
∂αi
+ ∂L F
Adv
∂αi
+ ∂L F
MSE
∂αi
,
(6.26)
and
αi ←αi −η2δαi,
(6.27)
where η2 is the learning rate for αi. Furthermore,
∂L K
Adv
∂αi
= −
1
(1 −D(αi ◦bwi; WD))
∂D
∂(αi ◦bwi)bwi.
(6.28)
∂L K
MSE
∂αi
= −λ(wi −αi ◦bwi)bwi,
(6.29)
∂L F
Adv
∂αi
= −
1
1 −D(ai; WD)
∂D
∂ai
∂ai
∂αi
I(i ∈L),
(6.30)
∂L F
MSE
∂αi
= μ(ai −a∗
H) ∂ai
∂αi
I(i ∈L),
(6.31)
Update pi: Finally, we update the other parameters pi with wi and αi fixed. δpi is defined
as the gradient of pi as
δpi = ∂LS
∂pi
(6.32)
pi ←pi −η3δpi,
(6.33)
where η3 is the learning rate for other parameters. These derivations demonstrate that the
refining process can be trained from the beginning to the end. The training process of our
BiRe-ID is summarized in Algorithm 13. We independently update the parameters while
fixing other parameters of convolutional layers to enhance the variation of the feature maps
in every layer. In this way, we can accelerate the convergence of training and fully explore
the potential of our 1-bit networks.